skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Guo, Jun"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Based on the vertical Total Electron Content (TEC) data observed by the Global Navigation Satellite System in the northern hemisphere, a large area of low plasma density during summer at high latitudes, termed decreased TEC region, was investigated statistically between 2014 and 2024. Compared with the classical depleted structures that usually occur in the nighttime F region at high latitudes during winter, decreased TEC region is usually found in the sunlit polar cap ionosphere during summer. The decreased TEC region is predominantly located in regions above 70° magnetic latitude for moderate and high solar activity. The lower‐TEC region is biased towards the dawn and midnight sectors. Along the 18:25–06:25 Magnetic Local Time meridian, the depth of the decreased TEC region reached 7.6TECu in 2014. The decreased TEC region is deeper for higher Kp (Kp > 2) than for low Kp (Kp ≤ 2). 
    more » « less
    Free, publicly-accessible full text available April 28, 2026
  2. Abstract MotivationNucleic acid binding proteins (NABPs) play critical roles in various and essential biological processes. Many machine learning-based methods have been developed to predict different types of NABPs. However, most of these studies have limited applications in predicting the types of NABPs for any given protein with unknown functions, due to several factors such as dataset construction, prediction scope and features used for training and testing. In addition, single-stranded DNA binding proteins (DBP) (SSBs) have not been extensively investigated for identifying novel SSBs from proteins with unknown functions. ResultsTo improve prediction accuracy of different types of NABPs for any given protein, we developed hierarchical and multi-class models with machine learning-based methods and a feature extracted from protein language model ESM2. Our results show that by combining the feature from ESM2 and machine learning methods, we can achieve high prediction accuracy up to 95% for each stage in the hierarchical approach, and 85% for overall prediction accuracy from the multi-class approach. More importantly, besides the much improved prediction of other types of NABPs, the models can be used to accurately predict single-stranded DBPs, which is underexplored. Availability and implementationThe datasets and code can be found at https://figshare.com/projects/Prediction_of_nucleic_acid_binding_proteins_using_protein_language_model/211555. 
    more » « less
  3. Abstract Nucleic acid-binding proteins (NABPs), including DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs), play important roles in essential biological processes. To facilitate functional annotation and accurate prediction of different types of NABPs, many machine learning-based computational approaches have been developed. However, the datasets used for training and testing as well as the prediction scopes in these studies have limited their applications. In this paper, we developed new strategies to overcome these limitations by generating more accurate and robust datasets and developing deep learning-based methods including both hierarchical and multi-class approaches to predict the types of NABPs for any given protein. The deep learning models employ two layers of convolutional neural network and one layer of long short-term memory. Our approaches outperform existing DBP and RBP predictors with a balanced prediction between DBPs and RBPs, and are more practically useful in identifying novel NABPs. The multi-class approach greatly improves the prediction accuracy of DBPs and RBPs, especially for the DBPs with ~12% improvement. Moreover, we explored the prediction accuracy of single-stranded DNA binding proteins and their effect on the overall prediction accuracy of NABP predictions. 
    more » « less
  4. Abstract DNA transposons have emerged as promising tools in both gene therapy and functional genomics. In particular, the Sleeping Beauty (SB) DNA transposon has advanced into clinical trials due to its ability to stably integrate DNA sequences of choice into eukaryotic genomes. The efficiency of the DNA transposon system depends on the interaction between the transposon DNA and the transposase enzyme that facilitates gene transfer. In this study, we assess the DNA-binding capabilities of variants of the SB transposase and demonstrate that the structural stability of the primary DNA-recognition subdomain, PAI, affects SB DNA-binding affinity and transposition activity. This fundamental understanding of the structure–function relationship of the SB transposase will assist the design of improved transposases for gene therapy applications. 
    more » « less
  5. Abstract The inversion of C3 stereochemistry in monoterpenoid indole alkaloids (MIAs), derived from the central precursor strictosidine (3S), is essential for synthesizing numerous 3RMIAs and oxindoles, including the antihypertensive drug reserpine found inRauvolfia serpentina(Indian snakeroot) andRauvolfia tetraphylla(devil pepper) of the plant family Apocynaceae. MIA biosynthesis begins with the reduction of strictosidine aglycone by various reductases, preserving the initial 3Sstereochemistry. In this study, we identify and biochemically characterize a conserved oxidase-reductase pair from the Apocynaceae, Rubiaceae, and Gelsemiaceae families of the order Gentianales: the heteroyohimbine/yohimbine/corynanthe C3-oxidase (HYC3O) and C3-reductase (HYC3R). These enzymes collaboratively invert the 3Sstereochemistry to 3Racross a range of substrates, resolving the long-standing question about the origin of 3RMIAs and oxindole derivatives, and facilitation of reserpine biosynthesis. Notably,HYC3OandHYC3Rare located within gene clusters in both theR. tetraphyllaandCatharanthus roseus(Madagascar periwinkle) genomes, which are partially homologous to an elusive geissoschizine synthase (GS) gene cluster we also identified in these species. InR. tetraphylla, these clusters occur closely in tandem on a single chromosome, likely stemming from a single segmental duplication event, while inC. roseus, a closely related member of rauvolfioid Apocynaceae, they were later separated by a chromosomal translocation. The ancestral genomic context for both clusters can be traced all the way back to common ancestry with grapevine. Given the presence of syntenic GS homologs inMitragyna speciosa(Rubiaceae), the GS cluster, at least in part, probably evolved at the base of the Gentianales, which split from other core eudicots up to 135 million years ago. We also show that the strictosidine biosynthetic gene cluster, required to initiate the MIA pathway, plausibly evolved concurrently. The reserpine biosynthetic cluster likely arose much later in the rauvolfioid lineage of Apocynaceae. Collectively, our work uncovers the genomic and biochemical basis for key events in MIA evolution and diversification, providing insights beyond the well-characterized vinblastine and ajmaline biosynthetic pathways. 
    more » « less
    Free, publicly-accessible full text available January 10, 2026